⚡️ Speed up function find_query_preview_references by 14,668%#29
Conversation
The optimization achieves a **14,667% speedup** primarily through **LRU caching** of expensive SQL parsing operations. Here's what changed: **Key Optimizations:** 1. **LRU Cache for SQL Parsing**: Added `@lru_cache(maxsize=64)` to cache `sqlparse.parse()` results, which was the dominant bottleneck (97.6% of original runtime). The same SQL strings are parsed multiple times during recursive traversal of query references. 2. **Cache Table Reference Extraction**: The `extract_table_references` function now uses cached `_cached_extract_table_references` that returns immutable tuples for cache efficiency while maintaining list compatibility for callers. 3. **Eliminated Redundant Object Comparisons**: Replaced the expensive `any(id(variable) == id(ref) for ref in query_preview_references)` check with a simple dictionary key lookup (`if variable_name in query_preview_references`), reducing O(n) iterations. 4. **Minor Micro-optimizations**: Stored `token.ttype` in a local variable to reduce attribute access overhead. **Why This Works:** - **Repeated Parsing**: The line profiler shows `sqlparse.parse()` consuming 99.7% of `is_single_select_query` runtime and 97.6% of `extract_table_references`. Caching eliminates this redundancy. - **Recursive Query Analysis**: When analyzing nested query references, the same SQL strings are parsed multiple times - caching provides exponential benefits. - **Test Results Pattern**: All test cases show 25x-400x improvements, with larger improvements for complex recursive/multiple reference scenarios (up to 45,000x for large-scale tests). **Best Performance Gains**: The optimization excels with repeated query analysis, recursive query references, and large-scale scenarios with many table references - exactly the patterns shown in the test cases where speedups range from 554% to 45,845%.
📝 WalkthroughWalkthroughTwo SQL utility modules were optimized to reduce redundant work through caching. In Pre-merge checks✅ Passed checks (3 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (2)
🧰 Additional context used🪛 Ruff (0.14.4)deepnote_toolkit/sql/sql_utils.py20-20: Missing return type annotation for private function (ANN202) deepnote_toolkit/sql/sql_query_chaining.py216-216: Missing return type annotation for private function (ANN202) 221-221: Do not catch blind exception: (BLE001) 222-222: Unnecessary Rewrite as a literal (C408) 244-244: Possible hardcoded password assigned to: "normalized_token" (S105) 🔇 Additional comments (4)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
📄 14,668% (146.68x) speedup for
find_query_preview_referencesindeepnote_toolkit/sql/sql_query_chaining.py⏱️ Runtime :
86.1 milliseconds→583 microseconds(best of250runs)📝 Explanation and details
The optimization achieves a 14,667% speedup primarily through LRU caching of expensive SQL parsing operations. Here's what changed:
Key Optimizations:
LRU Cache for SQL Parsing: Added
@lru_cache(maxsize=64)to cachesqlparse.parse()results, which was the dominant bottleneck (97.6% of original runtime). The same SQL strings are parsed multiple times during recursive traversal of query references.Cache Table Reference Extraction: The
extract_table_referencesfunction now uses cached_cached_extract_table_referencesthat returns immutable tuples for cache efficiency while maintaining list compatibility for callers.Eliminated Redundant Object Comparisons: Replaced the expensive
any(id(variable) == id(ref) for ref in query_preview_references)check with a simple dictionary key lookup (if variable_name in query_preview_references), reducing O(n) iterations.Minor Micro-optimizations: Stored
token.ttypein a local variable to reduce attribute access overhead.Why This Works:
sqlparse.parse()consuming 99.7% ofis_single_select_queryruntime and 97.6% ofextract_table_references. Caching eliminates this redundancy.Best Performance Gains: The optimization excels with repeated query analysis, recursive query references, and large-scale scenarios with many table references - exactly the patterns shown in the test cases where speedups range from 554% to 45,845%.
✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
unit/test_sql_query_chaining.py::TestSqlQueryChaining.test_find_query_preview_references_basicunit/test_sql_query_chaining.py::TestSqlQueryChaining.test_find_query_preview_references_circularunit/test_sql_query_chaining.py::TestSqlQueryChaining.test_find_query_preview_references_nestedunit/test_sql_query_chaining.py::TestSqlQueryChaining.test_find_query_preview_references_no_referencesunit/test_sql_query_chaining.py::TestSqlQueryChaining.test_find_query_preview_references_non_select_query🌀 Generated Regression Tests and Runtime
⏪ Replay Tests and Runtime
test_pytest_testsunittest_xdg_paths_py_testsunittest_jinjasql_utils_py_testsunittest_url_utils_py_testsun__replay_test_0.py::test_deepnote_toolkit_sql_sql_query_chaining_find_query_preview_referencesTo edit these changes
git checkout codeflash/optimize-find_query_preview_references-mhl9wno5and push.Summary by CodeRabbit